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This paper introduces Linguistic Style Improvisation, a 
theory and set of algorithms for improvisation of spo- 
ken utterances by artificial agents, with applications 
to interactive story and dialogue systems. We argue 
that linguistic style is a key aspect of character, and 
show how speech act representations common in AI can 
provide abstract representations from which computer 
characters can improvise. We show that the mecha- 
nisms proposed introduce the possibility of socially ori- 
ented agents, meet the requirements that lifelike char- 
acters be believable, and satisfy particular criteria for 
improvisation proposed by Hayes-Roth. 



Introduction 

Just because you are a character doesn't mean that you 
have character. Wolf to Raquel in Pulp Fiction, Q. 
Tarantino. 

Linguistic Style Improvisation (henceforth LSI) con- 
cerns the choices that speakers make about the seman- 
tic CONTENT, SYNTACTIC FORM and ACOUSTICAL RE- 
ALIZATION of their spoken utterances. This paper ar- 
gues that linguistic style is a key aspect of an agent's 
character. We present a novel theory of, and algorithms 
for, Linguistic Style Improvisation by computer charac- 
ters. 

As an example of how linguistic style can convey 
character, consider Victor Laszlo's request for two Coin- 
treaux in 1, from the Casablanca screenplay in Figure fil. 
In the film, this request is delivered in pleasant tones. 
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(Laszlo and Ilsa enter Rick's Cafe) 

Headwaiter: Yes, M'sieur? 

Laszlo: I reserved a table. Victor Laszlo. 

Waiter: Yes, M'sieur Laszlo. Right this way. 

(Laszlo and Ilsa follow the waiter to a table) 

Laszlo: Two Cointreaux, please. 

Waiter: Yes, M'sieur. 

Laszlo: (to Ilsa) I saw no one of Ugarte's description. 

Ilsa: Victor, I feel somehow we shouldn't stay here. 



Figure 1: Excerpt from the Casablanca script. 



(1) a. Two Cointreaux, please. 

However, consider the alternative stylistic realiza- 
tions in 2 for requesting two Cointreaux: 

(2) a. Bring us two Cointreaux, right away. 

b. You must bring us two Cointreaux. 

c. We don't have two Cointreaux, yet. 

d. You wouldn't want to bring us two Cointreaux, 
would you? 

Clearly, speakers make stylistic choices when they 
realize their communicative intentions, and their real- 
izations express their character and personality. And, 
based on these stylistic realizations, listeners draw in- 
ferences about the character and the personality of the 
speaker. Thus, algorithms for LSI are important for any 
domain in which agents speak, such as characters for in- 
teractive drama systems, m ultimoda l interface agents 
and spoken dialogue agents (|CPB+94|; |LB95|; |RWS+94|; 



MDBP94 |HRB94 |Kam95|) 



Our work on LSI draws from two th eoretic a l bases : 
comp utational work on speech ACTS( AlT79| ; Coh7&^ ; 
Lit85| ), and social ant hropol o gy and linguistics research 
on social interaction(Gof83; BL87 ). The Speech Acts 
section introduces the components of speech act theory 
that we draw on; the Social Interaction and Linguistic 
Style section discusses in detail Brown and Levinson's 
theory of linguistic social interaction. We argue that 
these two theories in combination yield a rich gener- 
ative source of different characterizations for artificial 
agents. The Computing Linguistic Style section then 



REQUEST-ACT(speaker, hearer, action) 
WANT(speaker, action) 
CANDO (hearer , action) 
surface-request (speaker, hearer, action) 

surface-request (speaker, hearer, informif (hearer, speaker, CANDo(hearer, action))) 
surface-inform(speaker, hearer, -i(CANDO(speaker, action))) 
surface-inform(speaker, hearer, WANT(speaker, action)) 
WANT(hearer, action) 
KNOw(hearer, want( speaker, action)) 
constraint: agent (action, hearer) 

Figure 2: Definition of the request-act plan operator from Litman and Allen, 1990 



header: 
precondition: 

decomposition- 1 
decomposition-2 
decomposition-3 
decomposition-4 

effects: 



explains how these theories provide the basis for gen- 
crating the improvisations such as those in 2, above. 
The Implementing Emotional Dispositions section dis- 
cusses how we augment these improvisations by select- 
ing for the speaker an emotio nal dis position and its 
attendant acoustical correlates ( |Cah9C| ). The Examples 
section illustrates how the theory is implemented in the 
domain of interactive story and dialogue. Finally we 
discuss how LSI extends and differs from other recent 
approaches to both interactive drama and text genera- 
tion and propose useful extensions to our current work. 

Speech Acts 

Speech acts were first proposed as a small set of commu- 
nicative intentions such as req uest o r inform that un- 
derly all utterance production(Sea7!i). In any language 
based application, interactive dialogue can be repre- 
sented as sequences of speech acts by multiple char- 
acters. Therefore, LSI uses speech acts as the abstract 
representation for utterances, and plans as the basis for 
improvisation — each spoken utterance is represented 
as an instantiation of a plan operator and these instanti- 
ations are interleaved with descriptions of physical acts 
in a real or simulated world. 

The inventory of speech acts is defined by the appli- 
cation. Ours consists of the initiating acts of inform, 
offer and two types of request: request-info and 
request-act. We also use three types of response 
speech acts for acceptance and rejection, corresponding 
to each major type of initiating act: ACCEPT-INFORM, 

ACCEPT-OFFER AND ACCEPT-REQUEST; and REJECT- 
INFORM, REJECT-OFFER and REJECT-REQUEST. 

Each speech act definition includes (a) specifying the 
conditions under which a speaker performing the speech 
act could be successful at achieving a communicative 
intention, and (b) specifying the effects on the hearer 
if the speaker is successful. Earlier computational work 
proposed that speech acts should be implemented in a 
standard AI planning system as plan operators that in- 
clude the act's decomposition, preconditions and 
effects, thereby enabling computer agents to plan 
utte rances in the same way that they plan physical 
acts(A1179; Coh78; Lit85). An example plan-based rep- 



resentation of a request-act (for example, Laszlo's 
request in |la| ) b ased on Litman and Allen's work, is 
given in Figure g( LA90 ) . 



A critical basis of our improvisation algorithms is 
speech act theory's distinction between the underlying 
intention of a speech act, and the surface forms of the 
utterance that can realize the speech act. This dis- 
tinction is seen in Figure 0: the request- ACT speech 
act specifies an underlying intention (the desired effect) 
of the speaker getting the hearer to do (or want to do) a 
particular action; while the four decompositions specify 
the different ways that the underlying speech act can 
be realized by surface speech acts, that is, by partic- 
ular sentential forms such as declarative sentences or 
questions. For example, the sentential equivalents of 
decompositions 1 to 4 in Figure |2| might be those in 
3a to 3d respectively, where action represents an action 
description: 

(3) a. Do action. 

b. Can you do action? 

c. I can't do action. 

d. I want action. 

Our algorithms for improvisation, to be discussed in 
the Computing Linguistic Style section, are mechanisms 
for deciding how to realize a given underlying intention 
as a particular surface form. While previous work on 
dialogue generatio n has fo cused on informational moti- 
vations and effects( MP93| ), we focus here on the impact 
of social and affective parameters on the selection of ut- 
terance form and content. 

Social Interaction and Linguistic Style 

Whenever agents realize a particular speech act, they 
make choices about the linguistic style with which that 
act is realized. Our main idea is that all these choices 
have a major effect on our perception of an agent's 
character and personality. Given the goal of achieving 
a particular communicative intention in a given social 
setting, an agent must choose among all the possible 
variations in SEMANTIC CONTENT, SYNTACTIC FORM 
and acoustical realization. We call these choices 



a strategy for realizing a particular communicative 
intention. 

The generative account we present is derived 
from Brow n and Levinson's theory of social interac- 
tion^ BL87) in which they identify a number of differ- 
ent variables and give examples of how different values 
for the variables produce different communicative out- 
comes. In LSI, we take their framework, refine its speci- 
fication where necessary, and specify the computational 
mechanisms required to implement it.n 

Maintaining public face An important basis of the 
theory is that all agents have and know each other to 
have: 

1. Face: An agent's public self image, which consists 
of the desire for: 

(a) Autonomy: Freedom of action and freedom from 
imposition by other agents; 

(b) Approval: A positive consistent self-image or 
personality that is appreciated and approved of by 
other agents; 

2. Capabilities for rational reasoning such as 
means-end reasoning, deliberation, and plan recog- 
nition. 

Social variables and face Given the desire to main- 
tain their own and others' face, and beliefs about their 
own and others' rationality, the agents' algorithm for 
choosing a strategy for realizing a particular speech act 
relies on evaluating three socially determined variables: 

1. D(S,H): the SOCIAL distance between the speaker 
and hearer. 

2. P(H,S): the power that the hearer has over the 
speaker. 

3. R a : a RANKING OF IMPOSITION for the act a under 
discussion. 

Human agents use personal experience, background 
knowledge, and cultural norms to determine the values 
for these variables. For example, SOCIAL distance of- 
ten depends on how well S and H know one another, 
but also on social class and status. Power comes from 
many sources, but often arises from the ability of S to 
control access to goods that H wants, such as money. 

The RANKING OF imposition relies on the fact that 
all agents' basic desires include the desire for autonomy 
and approval. Thus particular speech act types can be 
ranked as higher impositions simply by how they relate 
to agents' basic desires. 

Speech acts that can function as a threat to H's de- 
sire for autonomy include those that predicate some fu- 
ture act of H, as well as speech acts that predicate some 
future act of S toward H, such as offers, which put pres- 
sure on H to accept or reject them. This means that 



the act types of request-inform, request-act and 
offer threaten H's desire for autonomy. The INFORM 
speech act also threatens H's desire for autonomy on 
the basis that it is an attempt by S to affect H's mental 
state. 

Speech acts that threaten H's desire for approval in- 
clude all rejections, including the act types reject- 
inform, reject-offer and reject-request.^] 

Given our inventory of speech acts, and the range of 
the variables D and P, we instantiate the theory with 
the ranking of imposition R a based on the speech act 
type, as shown in Figure below n 
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inform 
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request-info 
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offer 
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reject-offer 
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reject-inform 
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reject-request 
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request-act 
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Figure 3: A ranking R Q on imposition of various types 
of speech acts with values from 1 to 50. 



Linguistic style strategies and social variables 

As social and rational actors, S and H attempt to avoid 
threats to one another's face. Given values for social 
distance D(S,H), power P(H,S) and ranking of imposi- 
tion R Q , the agent S estimates the threat to H of 
performing the speech act a by simply summing these 
variables as in equation 4: 

(4) 6 = D(S,H) + P(H,S) + R tt 

Once a value for O has been calculated, the agent 
uses it to choose among one of the following four strate- 
gies for executing a speech actQ 

(5) a. Direct: Do the act directly. 

b. Approval-Oriented: Orient the realization of 
the act to H's desire for approval; 

c. Autonomy-Oriented: Orient the realization of 
the act to H's desire for autonomy; 



2 Other speech acts not in our inventory, such as crit- 
icisms and c omplaints, also threaten H's desire for ap- 



1 Due to space constraints, we are unable here to present 
a full exe gesis o f their theory, the interested reader is re- 
ferred to JBL87|). 



proval( |BL87[ ). 

3 The values we use here serve to illustrate the model and 
range of phenomena. The actual values of the ranking of im- 
position need to be empirically determined with respect to 
the culture being modeled. We also discuss in our conclud- 
ing section how R Q should be a function of both speech act 
type and propositional content, rather than purely speech 
act type as we do here. 

4 Brown and Levinson include a strategy of not executing 
the speech act at all because the face threat is too great. 



d. Off-Record: Do the act off record by hinting, 
and/or by ensuring that the interpretation of the 
utterance is ambiguous. 

The lowest values of O lead to the direct strategy 
and higher values lead to the off-record strategy. 
In LSI, the range for each of the social variables D, P 
and R Q is between and 50. Therefore, the sum 
will range from to 150. direct strategies correspond 
to 9 values through 50, approval-oriented strate- 
gies to 6 values from 51 to 80, autonomy-oriented 
strategies for values from 81 to 120 and off-record 
strategies for values from 121 to 150.0 

Each strategy can be realized by a wide range of 
sub-strategies, whose semantic content is selected 
from the plan-based representation for a speech act and 
whose syntactic FORM is selected from a library of 
syntactic forms. And since there are many ways to re- 
alize each strategy, realizations within particular ranges 
are heuristically assigned to the upper or lower end of 
the scale, or assigned to the same values of the scale to 
support random variation. 

Emotion as an element of linguistic style 

Varying the affect of the spoken realization is a critical 
aspect of linguistic style. Although Brown and Levin- 
son state that expressions of strong emotion threaten 
both S and H's desires for approval and autonomy, they 
do not further specify the relation between strategies for 
selecting semantic content and syntactic form, 
and those for selecting the ACOUSTICAL realizations 
in the utterance which most directly express emotions. 
In order to explore this interaction, we adopt a very 
simple view of emotional expression: emotional dispo- 
sition is an orthogonal dimension to social variables, 
and each character is simply assigned an emotional dis- 
position at the start. 



Computing Linguistic Style 

Because LSI is defined on the basis of speech act types 
alone, what we have described so far is domain inde- 
pendent. However, the content of each speech act is 
domain specific. For example, in Figure 0, domain spe- 
cific contain is represented by the action variable in 
the definition of request-act. Similarly, the domain 
specific content in an inform speech act is represented 
by a proposition variable. Thus to test LSI, specific 
domains must be represented in terms of the actions 
and propositions of that domain. For example, Figure 

represents the domain specific action of serving two 
Cointreaux. 

We have tested LSI on speech acts derived from 
two domains: a task-oriented dialogue in which tw o 
agents discuss furnishing a two room house ( Wal96a ), 
and a segmen t of the Ca sablanca script shown in Fig- 
ure [i|( |Wal96b| ; |WABM95| ).pl 



I 



As shown in Figure [|, LSI takes an input a sequence 
of speech acts representing a dialogue, and a SOCIAL 
STRUCTURE which consists of a value between and 
50, for both social distance D and for power P, for each 
pair of agents in the dialogue. Then, for each speech act 
in the script or the dialogue, the speaker determines the 
social distance D between him/herself and the hearer, 
the power P that the hearer has over him/her, and the 
value on R Q for the speech act type as in Figure 0. 
Then by equation El the speaker calculates the value of 
0, and uses this to select one of the strategies given in 
5 above. 

We will now demonstrate how the algorithm oper- 
ates, by showing how different linguistic strategies re- 
sult from different social structures. In each case we 
will use the example from Casablanca, in which Laszlo 
orders two Cointreaux from Emil, and assume that the 
algorithm operates on the representations in Figures 
and pin Since there are many more realizations of the 
strategies than can be discussed here, interested readers 
are referred to (BL87). 



Direct strategies 

Direct strategies result from social structures in which 
both social distance D and power P are small. In the 
case of our two Cointreaux example, imagine that Laszlo 
and Emil are old friends, and that Emil, as the waiter, 
has no power over Laszlo. This could be modeled in our 
framework with a social structure in which the social 
distance D between Emil and Laszlo is 4 and the power 
P that Emil has over Laszlo is 0. According to Figure 
0, the R Q for request- ACT is 45. Using equation 4 and 
the values for P, D and R Q , the value for is 49, leading 
Laszlo to select a direct form strategy for realizing his 
request . 

The realizations for all DIRECT forms, irrespective 
of speech act type, are based on the semantic con- 
tent of the decomposition step of the speech act. Each 
speech act type has an associated default syntactic 

FORM. 

For example, in the case of request- ACTS we assume 
that the default syntactic form is an imperativejj Thus 
the simplest strategy for realizing a direct form is the 
REALIZE-DIRECT-STRATEGY: Realize the content of the 
decomposition step with its associated default syntactic 



5 Again these values are estimates selected for illustrative 
purposes. 

6 The task oriented dialogue representation is generated 



off-line by a planner, while the Casablanca script speech act 
representation is constr ucted by hand. In both cases, we 
use the generator FUF(Elh92) to generate surface forms. 
Because FUF does not operate directly on predicate logic 
representations used in plans, we therefore augment these 
with manually generated FUF equivalents. Future imple- 
mentations will include a transducer that generates FUF 
forms automatically from plan representations. 

7 Actually we will derive some of the decompositions in 
Litman's definition by rule( |AP80| |GL7l| ) . 

8 For speech acts such as inform, the default syntactic 
form is a declarative sentence, and for speech acts which are 
subtypes of ACCEPT or reject, the default forms are Okay, 
Yes or No, respectively. 
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Figure 4: Overview of LSI Algorithm 

header: SERVE(waiter, customer, two-cointreaux) 

precondition: HAS (restaurant, two-cointreaux) 

decomposition: BRiNG(waiter, customer, two-cointreaux) 

effects: HAS(customer, two-cointreaux) 

Figure 5: A possible plan in the restaurant domain for serving two Cointreaux 



form. For a request such as Two Cointreaux, please, this 
would result in an utterance such as: 

(6) Bring us two Cointreaux. 

Direct realizations can also be ordered within the 
range of to 50 so that lower values correspond to 
styles that convey that H has no power (P is low). One 
way to make a request-act is the power-direct- 
STRATEGY: Add you must or right away to the direct 
form. This is illustrated in 7 and 8: 

(7) Bring us two Cointreaux right away. 

(8) You must bring us two Cointreaux. 

Approval oriented strategies 

Approval oriented strategies result from social struc- 
tures in which there are minor differences in both power 
P and social distance D between the interactants, so 
that these factors play a weak role in strategy selec- 
tion. Strategies for orienting the realization of a speech 
act to the hearer's desire for approval include intensi- 
fying interest or attention to H, implying that S and H 
are cooperators who have the same perspective or de- 
sires, or conveying that S and H are part of the same 
social group or are friends. 

One way to convey that S and H have the same 
desires when making a request is the optimism- 
approval-strategy: S expresses optimism that H 



will want to do what S wants H to do. This strategy re- 
sults from selecting the semantic content to be realized 
from the want hearer action effect of the request-act (as 
in Figure @)FL and realizing this semantic content with a 
declarative sentence that includes a tag question. This 
strategy results in surface forms such as: 

(9) You'd like to bring us two Cointreaux, wouldn't 
you? 

One way to imply that S and H are in the same so- 
cial group and that S believes that the relative P be- 
tween himself and H is small is the GROUP-APPROVAL- 
STRATEGY: Use in-group address forms such as buddy, 
mate, honey, doll, my man, depending on the group. 
For a request, this is implemented by concatenating an 
in-group address form, my man, to the direct realiza- 
tion of the speech act, resulting in surface forms such 
as: 

(10) Hey Emil, my man, bring us two Cointreaux. 

For ACCEPT-OFFER Or ACCEPT-REQUEST Speech 

acts, approval oriented forms are those that explicitly 
assert the want effect of the offer or request speech 
act, such as: 

(11) I'd be glad to. 

9 A similar strategy of assuming that the effect already 
holds can also be used for inform speech acts. 



and 

(12) With pleasure. 

For rejections, approval oriented forms are those by 
which H affirms a social relationship with S such as: 

(13) I'm sorry, I can't. Normally I'd love to. 

Autonomy oriented strategies 

Autonomy oriented strategies result from social struc- 
tures in which there are significant differences between 
the two agents in either power P or social distance D. 
Under these circumstances S will choose strategies that 
make minimal assumptions about H's wants and de- 
sires, leaving H the option not to do the act, and disas- 
sociate S from possible infringement of H's autonomy. 
Note that the effect field in Figure g encodes infor- 
mation about H's wants and desires. Thus, one rule 
is to be pessimistic about H's desires. This can be 
achieved by selecting semantic content from this effect 
field with the negate-effect- autonomy-strategy: 
State that the want effect doesn't hold. This produces 
a form such as: 

(14) You wouldn't want to bring us two Cointreaux 
would you? 

In addition, note that the precondition field in Figure 
H encodes information about H's abilities. One way of 
leaving H the option not to do the act is for S to produce 
a query with this precondition as the semantic content, 
leaving H the option of saying that s/he is unable to 
do the act. This is the query- ability- autonomy- 
strategy, which results in forms such as: 

(15) Can you bring us two Cointreaux? 

One way of disassociating S and H from an au- 
tonomy infringement is to produce an indirect form 
of a request with the ASSERT-WANT-PRECONDITION- 
AUTONOMY-STRATEGY: State that the want precon- 
dition holds. This results in forms such as: 

(16) We'd like two Cointreaux. 

Another strategy for avoiding an autonomy in- 
fringement is the IMPERSONALIZE- ACTOR-AUTONOMY- 
STRATEGY: Impersonalize who actually performs the 
requested act. This results in proposals with no actor 
specified. It is also possible to produce proposals in 
which the act itself is unspecified, by selecting the se- 
mantic content for the request from the effect field of 
the domain act. For example, in Figure ra, the effect is 
that the customer has two Cointreaux. Using this field 
as the semantic content results in surface forms such 
as: 

(17) Let us have two Cointreaux. 

inform speech acts also have realizations that are 
autonomy oriented. An inform speech act can im- 
pinge on H's autonomy concerning what s/he wants to 
believe. One way to orient to H's autonomy is to softe n 
the strength of an assertion by HEDGING it ( PFB8C| ). 



For example, consider Laszlo's utterance of I reserved 
a table. This can be hedged by simply embedding the 
declarative sentence, which is produced from the de- 
composition step of the plan for an inform, with hedg- 
ing phrases such as I feel, I believe, It seems, As you 
may know, I think, I heard, or adding other hedges such 
as somehow, sort of, kind of to the verb phrase. This 
strategy is encapsulated in 18 and produces forms such 
as 19: 

(18) hedge-inform-STRATEGY: Augment any inform 
statement with either a pre-sentential or a verbal 
hedge. 

(19) I believe I reserved a table. 

An example of hedging in the original script (Figure 
|l|) is Lisa's assertion: 

(20) Victor, I feel somehow we shouldn't stay here. 

Hedging the strength of the assertion can also func- 
tion as an approval oriented strategy since it is a simple 
way to avoid disagreement. 

Off record strategies 

Off record strategies result from social situations in 
which there are significant values for social distance D 
or major discrepancies in power P between two agents, 
or from an act that is a large imposition on H. Tactics 
for going off record are difficult to implement because 
strategies for doing so involve indirect inference paths 
that are difficult to model computationally. There 
are, however, several simple ways to make a request 
off record by constructing hints from plan-based rep- 
resentations. One strategy is the ASSERT-negation- 
DOMAIN-effect-STRATEGY, in which S asserts that 
the effect of the domain plan does not hold, as in: 

(21) We don't have two Cointreaux yet. 

Another strategy is the ASSERT-DOMAIN- 
PRECONDITION-holds-STRATEGY: Assert that the 
precondition of the domain plan holds. For example, 
Laszlo's utterance of I reserved a table is a statement 
that the domain precondition for being shown to a ta- 
ble holds. Thus the original realization in the script is 
an off record form. 

Another strategy is the ABSTRACT-AGENT-AND- 
NEGATE-effect-STRATEGY: Select the semantic con- 
tent as the decomposition of the domain plan, abstract 
the agent role, and negate the asserti on of t he decompo- 
sition. This leads to an implicature(Hir85). The result 
is shown below: 

(22) Someone hasn't brought us two Cointreaux. 

In the current implementation of LSI, autonomy ori- 
ented forms are sometimes substituted for off record 
forms in order to provide more variability when char- 
acters choose to go off record. 



Implementing Emotional Dispositions 

Once a character's emotional disposition has been set, 
all of that character's utterances are synthesized with 
the acoustical correlates of that emotion. We imple- 
ment this by drawing on C alm's t heory of expressing 
affect in synthesized speech( |Cah90 ), and use a version 
of her Affect Editor program developed expressly for 
interactive theater and simulated conversation. 

The Affect Editor computes instructions for a speech 
synthesizer (so far, the DECtalk3 and 4.1) so that it 
produces emotional and expressive synthesized speech. 
The output is a set of synthesizer instructions; the in- 
put is a combination of text and acoustical parameter 
values. The parameters (seventeen in all) control the 
presence in the speech signal of various aspects of pitch, 
timing, voice quality and phoneme quality. 

Because some of the acoustical properties are mod- 
erated by linguistic properties of the text, the words 
in the text must be annotated for part of speech, fo- 
cus information (expressed as a likelihood of receiving 
intonational stress, that is, as the inverse of the acces- 
sibility of items in memory), and then the text itself 
marked with all possible phrase boundaries according 
to syntax and grammatical role. 

The acoustical parameters have numerical values. 
Their adjustment around zero — representing neutral 
affect — allows various shadings of emotional expres- 
sion, for example, from calm to sad to completely de- 
jected, or from enthusiasm to harsh anger. Our current 
LSI implementations make use of parameter value sets 
for seven emotional dispositions: Angry, Annoyed, Dis- 
gusted, Distraught, Gruff, Pleasant and Sad. 

Example Runs of Linguistic Style 

Improvisation 

To demonstrate the effect of LSI, we apply it to the first 
five lines of the Casablanca script in Figure [j], where 
agent A is Laszlo and agent B is the waiter. We provide 
an underlying abstract representation for this excerpt 
in terms of speech acts as specified in Figure 0. We use 
extreme power and social distance parameter settings 
in the examples to demonstrate the range of variation 
that is possible. 

A direct/angry speaker with an approval- 
oriented/pleasant hearer In a social structure in 
which A's emotional disposition is angry, and B's is 
pleasant, modeled by setting D(A,B) = 0, P(B,A) = 
0, D(B,A) = 30, and P(A,B) = 30, A will choose 
direct strategies and an angry delivery, and B will 
choose approval oriented strategies, delivered in pleas- 
ant tones. The result of this social structure applied to 
the Casablanca excerpt is: 

(23) W: Could I help you? 

L: You must take us to a table. I am Victor Laszlo. 



W: I'd be glad to. 

An autonomy-oriented/distraught speaker with 
a direct/pleasant hearer In a social structure 
where A's emotional disposition is distraught, and B's 
is pleasant, modeled by setting D(A,B) = 40, P(B,A) = 
40, D(B,A) = 0, and P(A,B) = 0, A will choose auton- 
omy oriented strategies and a distraught delivery, and 
B will choose the lower end of direct strategies and a 
pleasant delivery. The effect of this social structure on 
the Casablanca excerpt is: 

(24) W: I will help you 

L: Can you take us to a table? As you may know, I 
am Victor Laszlo 

W: Yes, if you insist. 

L: You wouldn't want to bring us two Cointreaux, 
would you? 

W: Yes, if I must. 

The values that produce 24 portray Laszlo as a wimp, 
for several reasons. First, Laszlo, who is the customer, 
is orienting to the waiter's autonomy. Second, the dis- 
traught delivery is very high pitched and tentative. Fi- 
nally, the fact that the waiter is rude highlights their 
differences in linguistic style. 

Related Work 

There are two areas of related work: recent work on in- 
teractive drama systems — in particular, Hayes-Roth's 
work on improvisation by computer characters; and the 
longer running body of work on natural language gen- 
eration. 

Interactive drama systems In empirical studies of 
huma n reacti ons to lifelike computer characters, Nass 
et al. ( NST95J ) show that linguistic style leads to spe- 
cific inferences about character. However, they rely on 
pre-scripted linguistic forms to demonstrate its effects 
and no generative mechanism is supplied. Other work 



W 
L 



It's a pleasure. 

Bring us two Cointreaux, right away 



in this area, for exampl e, that of Maes et aL(MDBP94) 
and Loyall and Bates (LB95) has focused on the be- 
havior on non-speaking animals, so that linguistic style 
has not been considered. Where characters do speak , 
their utterances are in the main pre-scripted (BL93), 
or ge neration d oes not focus on variations in linguistic 
style QCPB+9"4| ). 

Hayes-Roth's work on improvisation does allow for 
linguistic variation, but this arises by selection from 
a finite set of forms, a nd again no generative mecha- 
nism is given(HRB94; HRBS95). However this work 
provides a useful set of requirements for impro visation 
mechanisms of computer characters ( HRB S9q ) , which 
our mechanisms for LSI satisfy: 

1. Interesting variability in a character's interpretation 
of a given direction on different occasions; 

2. Random variability in the way a character performs 
a specific behavior on different occasions; 



(Laszlo and Ilsa enter Rick's Cafe) 




Headwaiter: Yes, M'sieur? 


(offer) 


Laszlo: I reserved a table. Victor Laszlo. 


(request-act) 


Waiter: Yes, M'sieur Laszlo. Right this way. 


(accept- request) 


(Laszlo and Ilsa follow the waiter to a table) 




Laszlo: Two Cointreaux, please. 


(request-act) 


Waiter: Yes, M'sieur. 


(accept- request) 



Figure 6: Assumed Speech Acts for an excerpt from the Casablanca script. 



3. Idiosyncrasies in the behaviors of different charac- 
ters; 

4. Plausible motivations for character's behavior; 

5. Recognizable emotions associated with character's 
behaviors and interactions. 

The dialogues in 23 and 24 demonstrate that social 
structure variables produce interesting variability, ran- 
dom variability, and idiosyncrasies. In addition, be- 
cause Brown and Levinson's theory is based on empiri- 
cal observation of human interaction in many cultures, 
a theory of LSI based on it satisfies Hayes-Roth's last 
two criteria. Since the theory captures linguistic uni- 
versal, human users should be able to ascribe plausible 
motivations and recognize the emotions associated with 
a character's behavior. Especially, the motivations the 
theory ascribes are not only descriptive and explana- 
tory, but predictive and generative. 

Text generation Previous work on natural language 
generation has addressed the problems of how surface 
forms can be generat ed from underlying speech acts 
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1P93 



Coh78 ; Dal88 ) , inter alia. However in the main, 



the variables that determine linguistic choice have all 
been task-related. The generation research has there- 
fore addressed the role of linguistic choice in indicating 
information structure; foregrounding and background- 
ing information; reducing cognitive overload, and the 
impact of these factors on inducing change in the 
hearer's beliefs. This task oriented perspective ignores 
other aspects of choice and interaction, namely, agents' 
motivations, and socially appropriate responses and be- 
havior. 

One exception is the work of Hovy( Hov93D , who does 
consider the effect of social factors on generation. How- 
ever, Hovy is concerned with generating news stories 
(text) which, in speech act terms are sequences of IN- 
FORM speech acts. In contrast, our work focuses on 
the generation of conversation, which requires a much 
wider range of speech acts. Furthermore, the news story 
genre affords fewer opportunities for social factors to af- 
fect generation given the anonymity of the generic text 
reader. 



theory of, and algorithms for, Linguistic Style Impro- 
visation by computer characters. This work expands 
the set of parameters that have been investigated in re- 
search on natural language generation of conversational 
speech. 

Possible interesting extensions to our work would be 
to introduce social feedback into our model, allowing 
linguistic actions to directly affect the SOCIAL struc- 
ture in the course of an interaction. We hope to ex- 
plore a reciprocal feedback loop to social structure, in 
which, for example, one agent's linguistic friendliness 
results in another agent adjusting their beliefs about 
social distance, and hence changing the second agent's 
future linguistic strategies. This should result in intcr- 
pretable and interesting changes in the way two agents 
treat one another over the course of a social interaction. 
We also hope to examine in more detail the relationship 
of acoustical expression of emotions to choices about 
linguistic semantic content and syntactic form. 

Another possible extension concerns a more complex 
function for calculating the ranking of imposition 
R a . The problem is that R Q should be a function of 
both the speech act type, and the type of the action in 
the domain. For example, a request-act that H pass 
the salt is less of an imposition than a request- ACT 
that H give S five dollars. We conjecture that a function 
for R Q could be based on inputs a and a domain act 5, if 
the speech act planner could access information about 
the effort involved with the execution of the domain act 
5. 

In sum, we have shown how LSI can be applied 
to computer characters in both interactive fiction and 
task-oriented dialogue simulation. In future work, we 
hope to investigate applying the same mechanisms to 
characters for personal assistants for spoken language 
interfaces flBL93| ; [Kam95| ; |YLM95| ). We believe that the 
combination of dimensions we have focused on provides 
a motivated and artistically interesting basis for mak- 
ing choices about linguistic style, that these choices 
are closely related to human perceptions of character 
and personality, and that they provide a rich genera- 
tive source of linguistic behaviors for lifelike computer 
characters. 



Discussion and Future Work 

In this paper, we have argued that linguistic style is an 
under-researched aspect of character, and presented a 
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